On risks of using a high performance hashing scheme with common universal classes
نویسنده
چکیده
The contribution of this thesis is a mathematical analysis a high performance hashing scheme called cuckoo hashing when combined with two very simple and efficient classes of functions that we refer to as the multiplicative class and the linear class, respectively. We prove that cuckoo hashing tends to work badly with these classes. In order to show this, we investigate how the inner structure of such functions influences the behavior of the cuckoo scheme when a set S of keys is inserted into initially empty tables. Cuckoo Hashing uses two tables of size m each. It is known that the insertion of an arbitrary set S of size n = (1 − δ)m for an arbitrary constant δ ∈ (0, 1) (which yields a load factor n/(2m) of up to 1/2) fails with probability O(1/n) if the hash functions are chosen from an Ω(log n)-wise independent class. This leads to the result of expected amortized constant time for a single insertion. In contrast to this we prove lower bounds of the following kind: If S is a uniformly random chosen set of size n = m/2 (leading to a load factor of only 1/4 (!)) then the insertion of S fails with probability Ω(1), or even with probability 1 − o(1), if the hash functions are either chosen from the multiplicative or the linear class. This answers an open question that was already raised by the inventors of cuckoo hashing, Pagh and Rodler, who observed in experiments that cuckoo hashing exhibits a bad behavior when combined with the multiplicative class. Our results implicitly show that the quality of pairwise independence is not sufficient for a hash class to work well with cuckoo hashing. Moreover, our work exemplifies that a recent result of Mitzenmacher and Vadhan, who prove that under certain constraints simple universal functions yield values that are highly independent and very close to uniform random, has to be applied with care: It may not hold if the constraints are not satisfied. ii Zusammenfassung Der Beitrag dieser Dissertation ist die mathematische Analyse eines Hashing-Verfahrens namens Cuckoo Hashing in Kombination mit einfachen, effizient aus-wertbaren Funktionen zweier Hashklassen, die wir die multiplikative Klasse bzw. die lineare Klasse nennen. Cuckoo Hashing hat die deutliche Tendenz, mit Funk-tionen dieser beiden Klassen schlecht zu funktionieren. Um dies zu beweisen, untersuchen wir den Einfluss der inneren Struktur solcher Funktionen auf das Verhalten des Cuckoo-Verfahrens, wenn der …
منابع مشابه
Image authentication using LBP-based perceptual image hashing
Feature extraction is a main step in all perceptual image hashing schemes in which robust features will led to better results in perceptual robustness. Simplicity, discriminative power, computational efficiency and robustness to illumination changes are counted as distinguished properties of Local Binary Pattern features. In this paper, we investigate the use of local binary patterns for percep...
متن کاملOn risks of using cuckoo hashing with simple universal hash classes
Cuckoo hashing, introduced by Pagh and Rodler [10], is a dynamic dictionary data structure for storing a set S of n keys from a universe U , with constant lookup time and amortized expected constant insertion time. For the analysis, space (2+ε)n and Ω(logn)-wise independence of the hash functions is sufficient. In experiments mentioned in [10], several weaker hash classes worked well; however, ...
متن کاملBeyond Parity Constraints: Fourier Analysis of Hash Functions for Inference
Random projections have played an important role in scaling up machine learning and data mining algorithms. Recently they have also been applied to probabilistic inference to estimate properties of high-dimensional distributions; however, they all rely on the same class of projections based on universal hashing. We provide a general framework to analyze random projections which relates their st...
متن کاملTabulation Based 5-Universal Hashing and Linear Probing
Previously [SODA’04] we devised the fastest known algorithm for 4-universal hashing. The hashing was based on small pre-computed 4-universal tables. This led to a five-fold improvement in speed over direct methods based on degree 3 polynomials. In this paper, we show that if the pre-computed tables are made 5-universal, then the hash value becomes 5universal without any other change to the comp...
متن کاملA Seven-Dimensional Analysis of Hashing Methods and its Implications on Query Processing
Hashing is a solved problem. It allows us to get constant time access for lookups. Hashing is also simple. It is safe to use an arbitrary method as a black box and expect good performance, and optimizations to hashing can only improve it by a negligible delta. Why are all of the previous statements plain wrong? That is what this paper is about. In this paper we thoroughly study hashing for inte...
متن کامل